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We describe a new scheme for optimizing many-electron trial wave functions by minimizing the 
unreweighted variance of the energy using stochastic integration and correlated-sampling techniques. 
The scheme is restricted to parameters that are linear in the exponent of a Jastrow correlation factor, 
(N ■ which are the most important parameters in the wave functions we use. The scheme is highly efficient 

and allows us to investigate the parameter space more closely than has been possible before. We 
search for multiple minima of the variance in the parameter space and compare the wave functions 
obtained using reweighted and unreweighted variance minimization. 
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I. INTRODUCTION 



Accurate many-body wave functions are essential to the variational and diffusion quantum Monte Carlo (VMC and 
DMC) methods, as the wave function controls both the statistical efficiency and the accuracy of these techniques^ 
Optimizing many-body wave functions is perhaps the most important technical issue facing practitioners of these 
quantum Monte Carlo (QMC) techniques today, and it consumes large quantities of human and computing resources. 
Wave-function optimization schemes have usually involved minimizing either the variational energy or its variance. 



& Although it is generally believed that wave functions corresponding to the minimum energy have more desirable 
, ^ | properties, variance minimization has been very widely used because it has proved easier to design robust minimization 
techniques for this purposei 2 ^ The scheme introduced in this article involves minimizing the unreweighted variance. 
We describe a new method for evaluating this quantity, which greatly accelerates the optimization of parameters that 
occur in a linear fashion in the exponent of a Jastrow factor. These are, in general, the most important parameters in 
i— i' QMC trial wave functions. The optimization step does not involve a sum over electron configurations, which means 
that we can use very large numbers of configurations. The unreweighted variance is in fact a quartic function of the 
linear parameters in the Jastrow factor, and the minima of multidimensional quartic functions can be located very 
£^ , rapidly. The efficiency of our scheme has enabled us to explore the minimization procedure and the parameter space 
j — ■ in detail, and to investigate the possible existence of multiple minima. 

The distinction between the reweighted or true variance and the unreweighted variance is explained in Sec. [H] In 
I/"") i Sec. IIIII we describe our accelerated scheme for calculating the unreweighted variance. In Sec. IIVI we use our new 
method to study the unreweighted variance in parameter space. The minima of the reweighted and unreweighted 
variance need not coincide, and in Sec.[3we investigate which minimum corresponds to the lower energy. We discuss 
the sampling of configuration space and the flexibility of the trial wave function in Sees . IVII and IVIll In Sees. HXI and 
i |x|we compare the efficiency of the "standard" and accelerated variance-minimization methods, both in theory and 
. ' practice. Finally, we draw our conclusions in Sec. IXII 

Hartree atomic units (a.u.) are used throughout, in which the Dirac constant, the magnitude of the electronic 
charge, the electronic mass, and 4n times the permittivity of free space are unity: h = |e| = m e — 47reo = 1. All of 
our QMC calculations were carried out using the CASINO packaged 



II. THE ENERGY AND ITS VARIANCE 

Consider a real trial wave function \&(R), where R is a point in the electron configuration space. In VMC the 
energy is written as 

Jv]/(R) 2 ff L (R)rfR 

J*(R) 2 dR ' 1 ' 

where the local energy, El, is 

E L (R) = 4 , (R) _1 -ff(R)*(R), (2) 
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and H is the Hamiltonian. The variance of the energy is 

2 _ J \fr(R) 2 (.Ex(R) - E) 2 dR 

/*(R)2dR ' W 

We write the trial wave function as ^"^(R), to denote that it depends on a set of free parameters, {a}. Throughout 
this article, we confine our attention to the optimization of parameters in the Jastrow factor. The nodal surface of the 
trial wave function is independent of such parameters. Consider a set of Nc configurations {R} distributed according 

to (*{"°}(R)) for some fixed parameter set {ao}. The variance a 2 is then estimated for any given parameter set 
{a} using a correlated-sampling procedure, which gives rise to the reweighted variance, 

2 ^ ^ (4«>(R) - E w ) V { { Q Q o } } (R), (4) 



(^: } } )^e r « } } (r)) 2 - 

where the reweighted energy is 



E,, 



^rE4 a} (R)^5(R), (5) 



T 

{ao} R 



which is an estimate of E, and the total weight is 

R 

and the weights W are 



The unreweighted variance as a function of parameter set {a} is defined to be 

-^^(^(R)-^) 2 , (8) 



/ ${ Q }(R) 



where the unreweighted energy is 



N C 



R 



The reweighted and unreweighted variances are identical when the same set of configurations is used and {a} = {ao}. 
However, for any given {ao} they are different functions of {a}, and there is no reason to expect that their minima 
coincide with each other, or that either minimum should coincide with that of the (reweighted) energy. 

Both a 2 ^ and a 2 are non-negative, but are zero when ^H"} is an eigenstate of H. The reweighted and unreweighted 
variances are therefore reasonable cost functions for wave-function optimizations. The reweighted energy is also a 
reasonable cost function. However, the problem with the reweighted energy and variance is that the weights W 
may vary rapidly as the parameters change, especially for large systems, which leads to instabilities in optimization 
procedures^ It can be shown that the wave function used to generate the configuration set corresponds to a stationary 
point of E u (for perfect sampling). In what follows we will mainly be interested in optimizing linear parameters in 
the Jastrow factor, and in this case we have proved that the wave function used to generate the configuration set 
corresponds to the global maximum of the unreweighted energy™. The unreweighted energy is clearly not a suitable 
cost function. From these considerations we conclude that the cost function with the most suitable mathematical 
properties for the stable optimization of wave functions within the correlated-sampling approach is the unreweighted 
variance. 

The usual variance-minimization procedure is to generate a set of electron configurations {R} distributed according 

to Q °} (R)) using VMC, and then to minimize the reweighted or unreweighted energy variance over this set. Since 
the variance landscape depends on the distribution of configurations, several cycles of configuration generation and 
optimization are normally carried out, with the optimized wave function from the previous cycle being used in each 
VMC configuration-generation phase. We usually iterate several times and choose the wave function that gives the 
lowest variational energy. In the limit of perfect sampling, the reweighted variance is equal to the actual variance, and 
is therefore independent of the configuration distribution, so that the optimized parameters would not change over 
successive cycles of reweighted variance minimization. This is not the case for unreweighted variance minimization; 
nevertheless, by carrying out a number of cycles, a "self-consistent" parameter set may be obtained. 
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III. ACCELERATED EVALUATION OF THE UNRE WEIGHTED VARIANCE 
A. The Slater-Jastrow wave function 

Let ^ be a Slater-Jastrow wave function for a many-body system: 

*(R) = exp[J(R)]S(R), (10) 

where cxp[J] is the Jastrow factor, which contains free parameters to be determined by an optimization method, and 
S is the Slater wave function, which may be an expansion in several determinants of single-particle orbitals. 
Suppose that J contains linear parameters ai, ... ,otp, that is, 

p 

J(R) = ^/ i (R)a i + J (R), (11) 

i=l 

where f± , . . . , fp and Jo are known functions of R, which depend upon the particular form of Jastrow factor used and 
do not contain any free parameters. We use the form of Jastrow factor described in detail in Refi^, which contains 
linear parameters. However, some of the terms have a finite extent in space and the associated cutoff lengths must 
appear nonlinearly in the Jastrow factor. These cutoff lengths can be set on physical grounds or optimized using 
small numbers of parameters and configurations and the standard variance-minimization procedure, but their values 
cannot be obtained using the accelerated scheme described here. 

B. Derivation of the quartic polynomial 

The local energy for the Slater-Jastrow wave function of Eq. (I10JI is 

p p p 
E L (R) = -^^^(RHa, -^gVQVoi - V»(R) + V(R), (12) 



2^^"' v ' J 2 

i=l j=l i=l 



where V is the potential energy and 



ff g>(R) = V/rV/j (13) 
ff W(R) = 2V/, • VJ + V 2 /, + 2^ • V/» (14) 

5 (°)(R) = |VJ | 2 + V 2 Jo + 2^.VJo + ^, (15) 

(2) (2) 

and we note that g\- = . The square of the local energy is given by 

^i( R ) = E E E E G%{B)a iaj a kai + E E E G<S(R)«<«i«* 

i=l j=l k=l 1=1 i=l j=l k=l 

p p p 

+ E E G l 2) ( r h^ + E G ? ] ( R )«» + G(0) ( R )' ( 16 ) 

i=X j—1 i—1 

G g) (R) = ggWW (17) 
Gg ) (R) = 4 2 W(R) (18) 
^Caj-^^^ca)^)-^) (i 9) 

G«(R) = -^W^R)-^") (20) 



where 



G<°>(R) = (V(R)-^^ 



2 

2 



(21) 
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(Note that G% = G% = G% = G%, G g> = G% and G$> = <#>.) 

Suppose the VMC method is used to generate a set of Nc points in configuration space, {R}, which are distributed 
according to the square of an approximate trial wave function. For any quantity A(R), let 

A =^12 A ^ (») 

R, 

be the average of A(R) over the set of Nc configurations. The unreweighted variance may be written as 



N c 

p p p p p p p 



N c 



N c 



^ EEZZ K m a i a i a kai + E E K Sl aia i ak 

\i=l j=l k=l 1=1 i=l j=l k=l 

p p p 



where 



+EE^^+E^+^ (0) 1< (23) 

2—1 J — 1 2 — 1 



(24) 
(25) 



^ijkl 


— ^i^kl 


o (2) o (2) 
9ij 9 ki 

4 


^i 3 k 


._ MS) 
- u i]k 


-(2)-(l) 

2 




- g% - 


9i 9 j ( 2 ) 
4 y *i 




= Gf > + 




K (0) 


= G<°> - 


('-?)'■ 



K-V) (26) 



(27) 
(28) 



(Note that K\f kl = K^f H = K$ k = K^l, K\f k = K^ k , and K\f = Kf).) The unreweighted variance is quartic 

in the set of free parameters. Once the values of K^ n > have been computed, there is no need to perform any further 
summations over the set of configurations during the optimization of the parameters. 

Throughout this article the potential energy is assumed to be a local operator, so the local potential energy 
is independent of the wave-function parameters. When the variance-minimization algorithm is applied to systems 
containing pseudoatoms, the change in the local potential energy due to the nonlocal part of the pseudopotential is 
neglected. Not only does this greatly improve the speed of the variance-minimization process, but it also appears to 
improve the stability of the algorithm. 



C. Evaluating the least-squares function during an optimization 

1. Accumulating G and g 

The values of g, G, and V are accumulated during a VMC simulation by keeping a running total of the values of g(R), 
G(R), and V(R) encountered at each step of the random walk; there is no need to store data for each configuration. 
The accumulated elements of G are stored in a one-dimensional array and, furthermore, the symmetries of G are 
exploited in order to minimize the length of this vector. The numbers of G^ , G< 3 \ G< 2 \ G^, and G<°) elements to 
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be calculated and stored are 



(p{p+11\ ( p ( p + 1 ) 
T 



4" = V 2 >\ ' '- (29) 



- ^ (3D 
iV^ = P (32) 
= 1, (33) 

respectively. G^jy is symmetric with respect to i and j, and is also symmetric with respect to k and I. In order to 
label the independent elements of G^ , one can replace (i, j) by a single index I that takes P(P+ 1)/2 different values. 
Likewise, (fc, Z) can be replaced by a single index J that takes P(P + l)/2 different values. G^j is still symmetric 
with respect to I and J; hence (J, J) can be replaced by a single index if which takes Nq^ different values, where 
Nq is given in Eq. I|29|). This is the method by which the elements of G^ are indexed in practice. Counting and 
indexing the elements of G^, G^ 2 \ and G^ 1 ' are relatively straightforward. The total number of G elements grows 
as 0(P 4 ). Storing these coefficients represents the memory bottleneck for the accelerated optimization procedure. 
With P = 30 parameters (a typical number), 122,791 G elements must be stored. With P = 100 parameters (a large 
number), 13,263,926 elements must be stored. Alternatively, the number of elements to be stored could be reduced 
by using the same strategy as that suggested in Sec. IIII C 21 for evaluating the unreweighted variance. This would not 
affect the number of elements that have to be evaluated, however, and it may slow down the VMC calculation even 
further. The saving in memory would typically be a factor of between 2.5 and 3, which is insignificant, given the 
0(P 4 ) scaling of the method. 



2. Evaluating the least-squares function 

Before the start of the optimization, the coefficient of each different product of parameters is computed and the 
coefficients are stored in a one-dimensional array. This allows the unreweighted variance to be evaluated extremely 
rapidly. The set of possible products of four of the parameters is {onctjCtkOn : i < j < k < I}, and similarly for the 
products of three and two parameters. So the unreweighted variance can be written as 



a 



Nc I rto)^ST„. [ r (i)xV». f jl V„„ I r(3) 
Nn- 



- + 5> if +I> rf rg) + £ ai v% , (34) 



i— 1 V j—i V k—j \ l—k 




where the r' n ) are defined in terms of K^ n ' (see below) and are stored as one-dimensional arrays. The number of 
elements of is given by the number of distinct products of n parameters, which can be shown to be 

4 n) =f p+ ; _1 Y (35) 



while the total number of elements of the T arrays is 

Nt=(^ 4 (36) 

which increases as <3(P 4 ). For P = 30 parameters, the number of terms that must be summed over to obtain the 
unreweighted variance is 46,376, while for P = 100 parameters, the number of terms is 4,598,126. 

For each k, 1} with i < j < k < I, T^' kl is equal to the sum of K^ kl over all distinct permutations of {i,j, k, I}. 
p(3) p( 2 ) r^ 1 ), and r^ ^ are constructed in a similar fashion. 
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3. Derivatives of the least-squares function 



Derivatives of the unreweighted variance are given by 



dal N. 



p p p p p 



da n N c 

where 



»=i j=i fc=i 



=1 




i=l j=l 


V 


M§l(n) 


- K [A) 

nijk 


+ K [A) 

1 injk 


+ K {i) +K {i) 

1 ijnk 1 ij'ren 


M§\n) 


= K (3) . ■+ 


vnj 1 


•jn 


M«(n) 


= K (2) 4 

m 1 


in 




#(») 


= 

x v n 







(38) 

(39) 

(40) 
(41) 

In practice, derivatives are evaluated as 

ijh ^ (a<°> (»)+£> u (i) L%)+2«*Agw)) > ( 42 ) 



where the A(n) are defined in terms of the M(n) in an analogous fashion to the definition of T in terms of K in 
Sec. IIII C 21 The total number of elements of A is 

A^ = P( P + 3 ), (43) 

which grows as 0(P 4 ). The A arrays used to evaluate the gradient of the unreweighted variance may be somewhat 
larger than the G arrays. 



D. Minimizing the variance 



Ideally, one would like to use an optimization method that enables one to find the global minimum of the variance 
with respect to the wave-function parameters. Unfortunately, existing variance-minimization algorithms generally use 
numerical optimization methods which, if started close to a particular local minimum, will always converge to that 
minimum. However, in the case of the quartic unreweighted variance in the space of linear Jastrow parameters, it is 
relatively easy to carry out an extensive search for the global minimum. 

Standard methods for minimizing a function of many variables include the method of steepest descents, the 
conjugate-gradients method, and the BFGS method^ Of these three methods, we have found the BFGS algorithm to 
converge most rapidly for a wide variety of test systems. 

Along any given line in the space of linear Jastrow parameters the unreweighted variance is a quartic polynomial of 
a single variable. The method by which the variance along a line can be re-expressed as a quartic polynomial is given 
in Appendix [S] A quartic polynomial of a single variable has at most two minima on the real axis. The gradient of a 
quartic function is a cubic, whose three roots can be obtained analytically; 7 hence it is straightforward to locate the 
global minimum of the unreweighted variance along the line. 

In order to search the parameter space for the global minimum of the variance with respect to the linear Jastrow 
parameters, we first perform a BFGS minimization. Starting from this minimum we choose directions at random 
and use the analytic line-minimization technique to search for a second minimum, lower than the first. If a second 
minimum is found then BFGS is used to converge to the new minimum, and the process is repeated. 



IV. THE NATURE OF THE UNREWEIGHTED VARIANCE 



A. Linear Jastrow parameters 



We used the method described in Sec. IIII Dl to search for minima when optimizing the linear Jastrow parameters in 
the SiH4 molecule, the all-electron neon atom, a 16-atom cell of diamond-structure pseudosilicon subject to periodic 
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FIG. 1: The unreweighted variance a\ for an all-electron neon atom plotted against one of the linear Jastrow parameters. All 
of the other linear parameters are set to zero. Different numbers of configurations were used to calculate the quartic coefficients 
of the unreweighted variance. In each case the configurations were distributed according to the square of the Hartree-Fock 
wave function. The Jastrow factor contained a total of 24 linear parameters. The curves for 10 s and 10 6 configurations are 
indistinguishable in the figure. 



boundary conditions, and an electron-hole gas. However, even sampling up to 10 7 random directions, multiple minima 
were only found when the configuration space was deliberately sampled extremely poorly. 

Plots of the unreweighted variance against the value of one of the linear parameters are shown for an all-electron 
neon atom in Fig.^ It can be seen that the unreweighted variance converges to a limit as the number of configurations 
is increased. There is only one minimum in every case. 

Plots of the unreweighted variance for an all-electron neon atom against the value of a parameter for an extremely 
poor sampling of configuration space are shown in Fig. [5] When few configurations were used (Nc = 40), it was 
possible to find two minima of the variance along lines in parameter space, proving that nonglobal minima can exist. 
However, it was also found that increasing the number of configurations tended to prevent the occurrence of two 
minima along lines in parameter space. 



B. Nonlinear Jastrow parameters 



Plots of the unreweighted variance of the SiH4 molecule against a nonlinear Jastrow parameter — the cutoff length for 
the electron-electron correlation term& — are shown in Figs. Ej and 0J The behavior of the unreweighted variance is far 
worse when the cutoff length is varied than when a linear parameter is varied: the variance has multiple minima along 
lines in parameter space and there is some noise in the variance, especially for poor samplings of configuration space. 
It can be seen in Fig. 0] that the optimized cutoff lengths obtained using 10 2 or 10 3 configurations are considerably 
shorter than the cutoff lengths obtained using 10 4 or 10 5 configurations. In the former case the cutoff lengths are 
trapped in the nonglobal minimum that can be seen in Fig. EI while the deeper minimum is reached in the latter case. 
The Jastrow factor used to produce Figs. and is such that the local energy is continuous when an electron-electron 
separation passes through the cutoff length. If a Jastrow factor that gives rise to a discontinuous local energy at the 
cutoff length were to be used, the variance would be an extremely noisy function of the cutoff length, especially for 
thin samplings of configuration space. Optimization of the cutoff lengths for such Jastrow factors has been found 
to be very difficult £ The existence of multiple minima when cutoff lengths are optimized suggests that it may be 
worthwhile performing variance-minimization calculations using several different initial cutoff lengths. 
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FIG. 2: The unreweighted variance a\ for an all-electron neon atom plotted against the change in one of the linear Jastrow 
parameters. All the parameters in the Jastrow parameter are set to large, random values. Different numbers of VMC-generated 
configurations were used to calculate the quartic coefficients of the unreweighted variance. In each case the configurations 
were distributed according to the square of a Slater- Jastrow wave function, where the Jastrow factor contained the random 
parameters, so that the resulting distribution was very unlike the ground-state distribution. The Jastrow factor contained a 
total of 72 linear parameters. The Slater wave function contained Hartree-Fock orbitals. 
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FIG. 3: The unreweighted variance crj for an SilrU molecule (with a Hartree-Fock silicon pseudopotential^) plotted against 
the cutoff length for the electron-electron terms in the Jastrow factor, L u . The Jastrow factor is such that the local energy 
is continuous when an electron-electron separation passes through the cutoff length*- Different numbers of VMC-generated 
configurations were used to calculate the unreweighted variance. All of the linear Jastrow parameters are set to zero. In 
each case the configurations were distributed according to the square of the Hartree-Fock wave function. The Jastrow factor 
contained a total of 56 linear parameters, plus three cutoff lengths. 
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FIG. 4: This figure is the same as Fig. [3] except that all of the parameters in the Jastrow factor (including the cutoff lengths) 
have been optimized. 



V. MINIMA OF THE VARIANCE AND THE ENERGY 
A. The reweighted and unreweighted variance 

1. Plots of the reweighted and unreweighted variance 

Plots of the reweighted and unreweighted variances for an all-electron neon atom against one of the linear Jastrow 
parameters are shown in Figs.[5]and|()]for small and large numbers of configurations. The reweighted and unreweighted 
variances have their minima in different places, with their values at the minima being different from one another. 
The variance is a smooth function of the linear Jastrow parameter in each case, but there are multiple minima of the 
reweighted variance along lines in parameter space, demonstrating that nonglobal minima can exist. Furthermore, 
the minima of the reweighted variance are not as sharply defined as those of the unreweighted variance. Minimization 
of the unreweighted variance is therefore more likely to be rapid and stable. 



2. Quality of variance-minimization results 

The outcomes of reweighted and unreweighted variance-minimization calculations are shown in Table [I] For a 
relatively sparse sampling of configuration space, reweighted variance minimization is pathologically unstable, while 
unreweighted variance minimization is perfectly well-behaved. For a dense sampling of configuration space the two 
methods give very similar results, and there is no evidence that the reweighted variance-minimization algorithm 
performs any better than the unreweighted algorithm, or vice versa. 



B. Coincidence of the minima of the energy and the variance 

As is clearly demonstrated in Fig.0 the self-consistent minimum of the unreweighted variance does not necessarily 
coincide with the minimum of the VMC energy. On the other hand, for a high-quality Jastrow factor, the minima of 
the unreweighted variance and energy are generally in close agreement, as is shown in Fig. [5] We have no evidence, for 
all-electron atoms at least, that any significant advantage could be obtained by optimizing linear Jastrow parameters 
in a good Jastrow factor using an energy-minimization method. It can also be seen in Fig. [S] that the reweighted 
energy follows the actual VMC energy data closely (the statistical error in the reweighted energy at the optimal 
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Change in linear Jastrow parameter a 



FIG. 5: The reweighted and unreweighted variance for an all-electron neon atom plotted against the change in the value of a 
linear Jastrow parameter, a± . Plots are shown for the case in which all the parameters are set to zero and the case in which all 
the parameters have been optimized. The set of 100 configurations used to calculate the variance were distributed according 
to the square of the Hartree-Fock wave function. The Jastrow factor contained a total of 27 linear parameters. 




Change in linear Jastrow parameter a 



FIG. 6: This figure is the same as Figure Q3 except that 10 4 configurations were used to calculate the variance. 



wave function is 0.001 a. u.). This implies that, provided enough configurations are used, the wave function could be 
optimized by reweighted energy minimization. 

A plot of the VMC energy variance against the change in a linear Jastrow parameter from its optimal value in an 
all-electron neon atom is shown in Fig. EI As one would expect, the reweighted variance matches the actual variance, 
unlike the unreweighted variance; however, there is no significant difference between the minima of the variance and 
the unreweighted variance (and hence the energy). 
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FIG. 7: The VMC energy against the change in the value of a linear Jastrow parameter ai from the value determined by 
self-consistent unreweighted variance minimization. The Jastrow factor was chosen to be poor, with no electron-nucleus or 
electron-electron-nucleus terms, and the same electron-electron terms were used for both parallel and antiparallel spins. There 
is only one optimizable parameter: ai. 




FIG. 8: The VMC energy of an all-electron neon atom against the change in the value of a linear Jastrow parameter ot\ 
from the value determined by self-consistent unreweighted variance minimization. Specifically, the Jastrow factor was the 
best all-electron neon Jastrow factor described in Refi&. The reweighted and unreweighted energies calculated using 8 x 10 5 
configurations distributed according to the square of the optimized wave function are also plotted. The statistical error bars in 
the VMC data are smaller than the symbols. 
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TABLE I: Results of reweighted and unreweighted variance-minimization calculations for an all-electron neon atom. P is the 
number of linear parameters in the Jastrow factor and Nc is the number of configurations used to perform the optimization. 
(Long VMC runs were used to obtain the energies and variances shown in the table.) Only linear Jastrow parameters were 
optimized. The VMC energy and variance for cycle 1 are estimates of the Hartree-Fock energy and variance, and are the same 
for each P and Nc- 



We have also studied the question of the coincidence of the minima of the energy, the variance, and the self-consistent 
unreweighted variance using a variety of model systems, for which the integrals could be performed exactly. The 
models consisted of one-dimensional potential wells and various trial wave functions with a single variable parameter. 
We studied several examples for a single particle and an example for two identical, interacting fermions. These 
examples showed that the global minima of the energy, the variance and the self-consistent unreweighted variance 
can be different. In all cases studied the parameters optimized by self-consistent unreweighted variance minimization 
gave lower energies than the parameters optimized by reweighted or "true" variance minimization. Furthermore, 
in many cases, the parameters from the self-consistent unreweighted variance minimum coincided exactly with the 
energy-minimized parameters, suggesting that some underlying principle was at work. 

VI. THE SAMPLING OF CONFIGURATION SPACE 

A. The number of configurations 

The VMC energy for a neon-atom Slater-Jastrow wave function is plotted against the number of configurations 
used to optimize the Jastrow factor in Fig. 1101 It can be seen that the wave-function quality improves very rapidly, 
then saturates at between 5 x 10 2 and 10 4 configurations, for both small and large numbers of parameters. For very 
small numbers of configurations, the optimizations give pathological results, especially when the more flexible Jastrow 
factor is used. 

Results obtained using reweighted variance minimization are also shown in Fig. 1101 The reweighted variance- 
minimization process was pathologically unstable for fewer than about 10 3 configurations. For larger numbers of 
configurations the energies obtained are in good agreement with the results of unreweighted variance minimization. 

B. The distribution of configurations 

The unreweighted variance for an all-electron neon atom is plotted against a linear Jastrow parameter for three 
different configuration distributions in Fig. 1111 The configurations were distributed according to (i) the square of 
the Hartree-Fock wave function, as is usually the case in the first cycle of a variance-minimization calculation; (ii) 
the square of an optimized Slater-Jastrow wave function, as is usually the case in the second and subsequent cycles; 
and (iii) the square of a Slater-Jastrow wave function in which the Jastrow factor was chosen to be poor. Although 
the variance looks different in each case, the positions of the minima coincide almost exactly for the Slater and 
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FIG. 9: The VMC variance for an all-electron neon atom against the change in the value of a linear Jastrow parameter 
ai from its value determined by self-consistent unreweighted variance minimization. Specifically, the Jastrow factor was the 
best all-electron neon Jastrow factor described in Refj^ The reweighted and unreweighted variances calculated using 8 x 10 5 
configurations distributed according to the square of the optimized wave function are also plotted. Where the error bars in the 
VMC data cannot be seen, they are smaller than the symbols. 




FIG. 10: The VMC energy of an all-electron neon atom against the number of configurations used to optimize the linear 
Jastrow parameters in an unreweighted variance minimization. Eight optimization cycles were performed for each number of 
configurations in order to ensure that self-consistency was achieved. The Slater wave function contained Hartree-Fock orbitals. 
The error bars in the VMC data are smaller than the symbols. 
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Linear Jastrow parameter a 



FIG. 11: The unreweighted variance for an all-electron neon atom plotted against the value of the linear Jastrow parameter 
ai for three different configuration distributions: the square of the Hartree-Fock wave function, the square of an optimized 
Slater- Jastrow wave function and the square of a poor Slater- Jastrow wave function. 10 6 configurations were used to calculate 
the unreweighted variance. The Jastrow factor contained a total of 36 linear parameters. The Slater wave function contained 
Hartree-Fock orbitals. 



optimized Slater- Jastrow distributions. Even for the poor wave function, the minimum of the variance is reasonably 
close to the more accurately determined optimum. This is consistent with our observation that, in general, the only 
significant improvement to the quality of a Jastrow factor occurs in the first cycle of a series of unreweighted variance- 
minimization calculations: starting from the Hartree-Fock wave function, the self-consistent solution is usually reached 
in the first cycle. 

VII. THE FLEXIBILITY OF THE JASTROW FACTOR 

The VMC energy of neon is plotted against the number of linear parameters used in the Jastrow factor in Fig. IT21 
The results illustrate the futility of attempting to optimize too many parameters. The quality of the optimized wave 
function depends on the number of configurations used to perform the optimization, especially when the number 
of parameters in the wave function is either very small or very large. However, there would only appear to be an 
advantage to be gained by using more than 10 4 configurations when a very large number of parameters are to be 
optimized. 

It should be reemphasized that the problems which occur when large numbers of parameters are optimized are 
caused by mismatches between the minima of the unreweighted variance and the energy, and not by the introduction 
of local minima into the variance landscape. 

VIII. LIMITING OF CONFIGURATION WEIGHTS 

It has been suggested that variance-minimization calculations are disproportionately affected by "outlying" config- 
urations, whose energies deviate substantially from the mean energy^ In particular, the local energy diverges in the 
vicinity of the nodal surface of the trial wave function, so configurations in this region are especially problematic. 
Such configurations are relatively rare when the nodes are fixed, as is the case when only Jastrow parameters are 
optimized, but the problem can be far more serious when parameters that affect the nodal surface are optimized using 
a fixed sampling of configuration space. 

We have studied a smooth scheme for removing outlying configurations from the optimization process. Let us define 
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Number of linear parameters in Jastrow factor 

FIG. 12: The VMC energy of an all-electron neon atom against the number of parameters in the Jastrow factor. Different 
numbers of configurations were used to carry out the unreweighted variance-minimization calculations. Six optimization cycles 
were performed in order to guarantee self-consistency. Very long VMC simulations were carried out using the optimized Jastrow 
factors in order to obtain the energies plotted in the graph. The VMC error bars are smaller than the symbols. The Slater 
wave function contained Hartree-Fock orbitals. 



the configuration "effective weight" to be 



W'(R) 



( (4«>(R) 



1 — tanh 



E, 



\ 



(44) 



where E u and are the unreweighted energy and variance of the set of configurations. W'(R) « 1 for configurations 

such that E^ (R) rj E u , but W'(R) — ► for configurations whose local energies are far from the mean. The 
parameter A is the number of standard deviations of the energy beyond which configurations are excluded, while B is 
the width of the region in which the effective weights fall off to zero (in terms of standard deviations of the energy) . 
We typically chose A to lie between 2 and 3 and B to lie between 1/2 and 1. The effective weights W are used in 
place of the weights W in Eq. and the reweighted variance is minimized. 

We have found that this weight-limiting scheme is capable of improving the stability of Jastrow-factor optimization 
when very small numbers of configurations are used. However, the energies of the resulting wave functions are not 
generally as good as the energies obtained using the same forms of wave function optimized with an adequate number 
of configurations. For large numbers of configurations, the limiting scheme has very little effect on the optimization 
of Jastrow factors. We conclude that the weight-limiting scheme is not of much practical benefit when a Jastrow 
factor is to be optimized; however the scheme has been found to be very useful when parameters that affect the nodal 
surface are optimized^ 

Other limiting schemes have been devised to improve the stability of the variance-minimization algorithms. For 
example, it is possible to combine the reweighted and unreweighted variance-minimization algorithms by limiting the 
values that the weights W can take^ Alternatively, the local energies themselves can be limited^ The latter approach 
has been found to be problematic, as it can result in spurious minima in the variance corresponding to parameter sets 
for which a large number of local energies are limited. 
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IX. SCALING OF THE VARIANCE-MINIMIZATION METHODS 
A. CPU time required for the optimization phase 

Let N be the total number of electrons in a system, P be the number of Jastrow parameters to be optimized, and 
Nq be the number of configurations used to calculate the variance. Although the Jastrow factor of Ref. 6 is considered 
in this work, the conclusions reached should be valid for most other forms of Jastrow factor in current use. 

The computational effort required to evaluate the unreweighted variance using the accelerated method is indepen- 
dent of N and Nq, but scales as 0(P 4 ). The time taken to compute the gradient of the variance is also 0(P 4 ). It 
may be assumed that the number of optimization steps required is independent of P, N, and Nc- The 0(P 4 ) scaling 
of the memory requirements of the accelerated method limits the number of parameters that can be optimized in a 
single calculation to between 100 and 200, depending on the available memory. 

The time taken to recompute the Jastrow factor and its derivatives after all of the parameters have changed is 
generally O(N) for electron-nucleus and electron-electron-nucleus terms and 0(N 2 ) for electron-electron terms£ The 
CPU time required to evaluate the variance (reweighted or unreweighted) using the standard procedure therefore 
increases as 0(N 2 ). The time taken to calculate the Jastrow factor is, in general, O(P), and hence the time taken 
to calculate the variance using the standard method is also 0{P). Furthermore, each minimization step requires the 
gradient of the variance with respect to the parameters, which has P components The time taken to perform each 
iteration is therefore 0(P 2 ). The CPU time for the standard method clearly scales as O(Nc)- 

Putting this together, the CPU time for the optimization phase scales as 0(P 4 ) for the accelerated method and 
0(N 2 P 2 Nc) for the standard method. It should be noted that the time required by the optimization phase in the 
accelerated scheme is completely negligible in comparison with the time required by the VMC coefficient-gathering 
phase, whereas the CPU time required by the optimization phase in the standard method is usually rather greater 
than the CPU time required by the VMC phase. 

B. The CPU time required for the gathering of the quartic coefficients in the accelerated scheme 

In the standard variance-minimization method, the CPU time required to generate the set of configurations used 
to compute the variance does not differ appreciably from the time taken to perform an ordinary VMC simulation. 
For the accelerated optimization method, however, the time taken to compute the quartic expansion coefficients can 
be a significant fraction of the total CPU time. 

The gathering of the quartic coefficients can be divided into two stages: (i) the evaluation of the Jastrow "basis 
functions" /<(R) for each configuration R (see Eq. l(H|ll. and (ii) the calculation of the corresponding contributions 
to the g and G arrays. Stage (ii) scales as 0(P 4 ), but is independent of system size. By contrast, stage (i) scales 
as O(P), because there are P basis functions, but the scaling with system size is the same as that of evaluating the 
Jastrow factor: roughly 0(N 2 ). 

The CPU time for an ordinary VMC calculation is generally determined by the time taken to evaluate the orbitals 
in the Slater wave function. The computational effort required to carry out a fixed number of configuration moves 
grows as 0(N 2 ) if extended orbitals represented in a localized basis are used. The use of localized orbitals can improve 
this scaling to 0(N)i^ In principle the time taken for stage (i) of the coefficient gathering will take up an increasingly 
large fraction of the CPU time, but in practice the prefactor is so small that the time required is negligible even for 
the largest systems that we have studied. The time taken for stage (ii) can be the largest contribution to the CPU 
time for VMC simulations of small molecules, but the effort required is independent of system size, and so, overall, 
the coefficient-gathering phase of the accelerated scheme is more efficient for large systems than small systems. 

X. EFFICIENCY OF THE ACCELERATED OPTIMIZATION METHOD 

Timing results for the optimization of the linear Jastrow parameters for an H2O molecule (10 electrons) and a 
C26H32 molecule (136 electrons) are shown in Tables UTI and ITTT1 respectively. The calculations are fairly typical in 
terms of the number of parameters and number of configurations. In both cases the use of the accelerated optimization 
scheme essentially eliminates the cost of the optimization phase. In the standard method the cost of the optimization 
phase exceeds that of the VMC configuration-generation phase by an order of magnitude for H2O and by a less 
significant proportion for C26H32. The cost of the VMC phase in the accelerated scheme is increased substantially for 
H 2 although, overall, it is still much faster to use the accelerated scheme. For C26H32 the increase in the CPU time 
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Method 


Stage 


CPU time (s) 




VMC 


5669.43 


Standard 


Opt. 


58740.65 




Total 


64410.08 




VMC 


14378.90 


Accel. 


Opt. 


39.69 




Total 


14418.59 



TABLE II: Timing results for ten cycles of a 6 x 10 4 -configuration unreweighted variance minimization of a 38-linear-parameter 
Jastrow factor for an all-electron H2O molecule. The system contains a total of 10 electrons. The Slater wave function contained 
Hartree-Fock orbitals. The runs were carried out on a 1.7 GHz Pentium processor in a Sony Vaio laptop. 



Method 


Stage 


CPU time (s) 




VMC 


6526.77 


Standard 


Opt. 


9323.53 




Total 


15850.30 




VMC 


6828.47 


Accel. 


Opt. 


28.14 




Total 


6856.61 



TABLE III: Timing results for four cycles of a 1.6 x 10 4 -configuration unreweighted variance minimization of a 12-linear- 
parameter Jastrow factor for a C26H32 molecule with Troullier-Martins carbon and hydrogen pseudopotentials. The system 
contains a total of 136 electrons. The Slater wave function contained DFT-PBE orbitals. The runs were carried out on a cluster 
of eight 2.1 GHz Opteron processors. 



for configuration generation is negligible. Overall, the accelerated optimization scheme is 4.5 times faster for H2O 
and 2.3 times faster for C26H32. 

The actual time taken to compute the variance in the accelerated scheme is minute: On a 2.7 GHz Pentium 4 
processor, it takes an average of 83.6 /is to compute the variance with 25 parameters, while it takes 13.98 ms to 
compute the variance with 100 parameters. 

XI. CONCLUSIONS 

We have introduced a new scheme for evaluating the unreweighted variance of the VMC energy, which greatly 
accelerates the optimization of parameters that occur in a linear fashion in the exponent of a Jastrow factor. This 
scheme is very efficient because it uses the property that the unreweighted variance is a quartic function of such 
parameters. We studied a wide range of systems and found that the unreweighted variance almost invariably has a 
single minimum in the space of the linear parameters. The only exceptions to this that we could find occurred when 
the configuration space was very poorly sampled. For other wave-function parameters, however, the unreweighted 
variance often has more than one minimum. 

It is easy to use very large numbers of configurations to perform optimizations using our accelerated scheme. We 
have investigated the effect of varying the number of configurations on the wave-function quality, and we have found 
that there is, in general, no significant benefit to be obtained from using more than about 10 4 configurations when 
optimizing linear Jastrow parameters. 

We have considered various wave-function optimization schemes using correlated-sampling approaches for mini- 
mizing the energy and the variance of the energy. Rewcightcd energy and variance minimization using correlated 
sampling suffer from numerical instabilities due to fluctuations in the values of the weights, which are severe for large 
systems. The unreweighted energy always has a stationary point at the wave function used to generate the configu- 
ration set, and for parameters which occur linearly in the Jastrow factor this stationary point is the global maximum 
in the energy. The unreweighted energy is therefore not a suitable cost function for wave-function optimization. The 
minima of the variance, the unreweighted variance (iterated to self-consistency), and the energy are generally distinct. 
In various model systems that we have studied, the self-consistent minimum in the unreweighted variance always gave 
lower energies than the minimum in the reweighted variance. 
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APPENDIX A: CONSTRUCTING THE QUARTIC POLYNOMIAL CORRESPONDING TO A LINE IN 

PARAMETER SPACE 

Consider the expression for the quartic unreweighted variance as a function of the linear parameters (Eq. I|34|l). 
and consider a line in parameter space 

a(t)=A + Bt, (Al) 
where a — (a±, . . . ,ap) and A and B are constant vectors. The unreweighted variance along the line is given by 

al{t) = N Nc _ 1 (^4t 4 + n 3 t 3 + n 2 t 2 + nit + n ) , (A2) 



where 



"4 = ffiif *i E Bk E B i v w ( A3 ) 

i— 1 j—i k—j l—k 

^ = E^E^E^E^+E^E^E^E^ 

i— 1 j—i k—j l—k i—1 j—i k—j l—k 

+ E Bi E b, E m f; B, r w + E B « X> E B * ( r S + E ( A4 ) 

2—1 j— 2 /c— j l—k i—1 j—i k—j \ l—k / 

i—1 j—i k—j l—k i—1 j—i k—j l — k 

+ E * E bj E * ( rgl + E *r«) +X>X>X>E 

i—1 j—i fc=j \ /— fc / i—1 j"— z /c— j /— /c 

1— 1 j—i k—j \ l — k / 

+ X>X>(r? + X> fe + X>r$,) 

i=l j=i \ k=j \ l=k / _ 

E f ^ + E (W + E * f rgi + E ) 
+ X>E^ 2) +E^ (rgj + £>r«) 

2— 1 J— 2 V fc— j \ /— k / 

+X>X>E f r Si + E ^ r S) +X>E 4 E * E ^ r S« ( A6 ) 

2—1 j—i k—j \ l—k / i—1 j—i k—j l — k 

r(°> + f; m f if 5 + £ a, f r« + E * f rgi + E ^ r $*l ) ] • ( A7 ) 
i=i \ j=t \ k=j \ i=k / J J 



Qi = 



(A5) 



n = 
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All the terms that appear in Eqs. (|A3|) - (jA7|l can be evaluated within a single loop over i, j, k, and I. 
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